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Abstract 

Symbolic  task  planning  enables  a  robot  to  make  high-level  deci¬ 
sions  toward  a  complex  goal  by  computing  a  sequence  of  actions  with 
minimum  expected  costs.  This  thesis  builds  on  a  single-robot  planning 
framework,  and  aims  to  address  two  issues:  (1)  lack  of  performance  in¬ 
formation  in  the  selection  of  planners  across  different  formalisms,  and 
(2)  time  complexity  of  optimal  planning  for  multiple  mobile  robots.  In 
this  thesis  we  first  investigate  the  performance  of  the  state-of-the-art 
solvers  of  Planning  Domain  Definition  Language  (PDDL)  and  Answer 
Set  Programming  (ASP)  in  robot  navigation  problems.  We  then  aim  to 
reduce  overall  costs  where  multiple  mobile  robots  may  block  in  narrow 
corridors  or  collaborate  to  open  doors.  It  is  challenging  to  model  such 
interactions  due  to  uncertain  delays  of  navigation  actions  in  populated 
areas.  This  paper  addresses  this  challenge  with  an  algorithm  which  cal¬ 
culates  the  conditional  distribution  of  plan  costs  for  each  robot,  given 
planned  actions  of  other  robots.  We  then  propose  an  iterative  con¬ 
ditional  planning  algorithm  to  efficiently  approximate  optimal  plans 
of  the  system.  Experiments  in  simulation  and  a  demonstration  on 
real  robots  show  that  the  algorithm  has  a  significant  advantage  over 
baselines  in  which  robots  plan  individually,  or  plan  together  without  a 
model  for  navigation  action  durations. 
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1  Introduction 


As  mobile  robots  become  capable  of  autonomous  navigation  for  extended 
periods  of  time  in  large  spaces,  it  is  often  inefficient  to  plan  the  complete 
trajectory  in  one  setting.  Some  robots  are  also  designed  to  perform  het¬ 
erogeneous  tasks,  such  as  picking  up  objects  and  interacting  with  humans 
along  the  way.  Symbolic  planning,  given  abstracted  descriptions  of  the  en¬ 
vironment,  allows  a  robot  to  break  down  a  complex  task  into  a  sequence 
of  symbolic  actions.  Planning  problems  are  often  represented  in  action  lan¬ 
guages,  which  formalize  the  state  transition  system  as  action  preconditions 
and  effects.  Many  symbolic  planners  solve  such  problems  by  heuristic-based 
search  of  the  state  space  specified  by  the  action  language  encoding.  Another 
approach  is  to  convert  action  programs  to  Answer  Set  Programming  (ASP) 
programs,  and  compute  answer  sets  with  techniques  in  satisfiability  test¬ 
ing  [12]  [24],  Both  approaches  are  able  to  consider  action  costs  and  compute 
the  optimal  plan,  so  the  robot  can  reach  the  goal  while  minimizing  the  use 
of  resources.  In  a  dynamic  environment  where  the  state  can  change  without 
involvement  of  the  robot,  for  instance,  where  doors  open  and  close  by  pedes¬ 
trians,  it  is  difficult  to  predict  the  exact  state.  At  planning  time,  the  robot 
may  use  defaults,  and  replan  if  further  sensing  negates  the  assumptions  of 
the  plan. 

When  multiple  robots  are  operating  in  the  same  space,  the  action  of  one 
robot  may  affect  the  others,  so  their  individual  optimal  plans  can  become 
overall  suboptimal.  For  example,  if  two  robots  decide  to  take  a  narrow 
corridor  in  opposite  directions,  they  will  block  each  other  in  the  corridor, 
and  both  robots  will  have  to  replan.  Unlike  the  state  of  a  door  which  is 
controlled  unpredictable  forces,  interactions  between  robots  can  be  predicted 
based  on  their  planned  actions.  One  conceivable  solution  is  to  encode  all 
robots  into  the  planning  domain,  along  with  action  effects  they  may  have 
on  each  other.  In  the  example  of  a  narrow  corridor,  if  we  explicitly  state 
that  navigating  through  the  corridor  has  the  effect  of  blocking  it  in  the 
opposite  direction,  a  symbolic  planner  can  search  for  collision-free  plans  for 
all  robots.  There  are,  however,  two  problems  with  this  approach.  First,  it 
is  not  scalable  to  the  number  of  robots.  Second,  the  corridor  is  blocked  only 
when  a  robot  is  inside,  and  the  entry  and  exit  time  may  vary  in  practice. 
Since  the  robot  has  to  move  around  humans  and  other  movable  objects 
in  a  populated  domain,  navigation  actions  can  be  delayed  by  an  uncertain 
amount  of  time.  Therefore,  the  goal  of  this  research  is  to  build  an  efficient 
multi-robot  task  planning  system  that  scales  well  with  the  number  of  robots 
while  addressing  uncertainties  in  navigation  time. 


2 


The  rest  of  this  thesis  is  organized  as  follows:  Section  2  introduces  the 
background  of  symbolic  task  planning,  the  development  of  action  languages 
and  planners,  and  their  counterparts  for  multi-robot  planning.  The  first 
part  of  Section  3  describes  the  existing  hardware  and  software  platform  of 
the  Building  Wide  Intelligence  (BWI)  robots,  a  group  of  autonomous  robots 
designed  to  interact  with  people  in  the  Computer  Science  building  of  UT 
Austin.  We  then  introduce  the  representation  of  an  example  office  domain  in 
action  languages,  and  select  a  fast  planner  based  on  an  empirical  comparison 
of  the  performance  of  state-of-the-art  solvers.  Section  4  focuses  on  solving 
the  problem  of  symbolic  planning  for  multiple  mobile  robots  with  our  itera¬ 
tive  conditional  planning  algorithms.  Finally,  we  evaluate  several  configura¬ 
tions  of  the  algorithm  in  simulations  of  the  office  domain,  and  demonstrate 
successful  implementation  on  real  robots. 

2  Background 

The  history  of  using  action  languages  to  describe  and  solve  automated  plan¬ 
ning  problems  dates  back  to  the  development  of  STRIPS  [9]  [13] .  One  of  the 
most  popular  action  languages  today  is  the  Planning  Domain  Definition  Lan¬ 
guage  (PDDL)  [26].  PDDL  is  developed  as  a  standard  formalism  to  compare 
planner  performances  in  International  Planning  Competition  (IPC)  [35]. 
Since  then,  many  planning  systems  have  been  developed  with  state-of-the- 
art  heuristics,  such  as  Fast-Forward  [17]  and  Fast-Downward  [16].  On  an¬ 
other  front,  efforts  have  been  made  to  improve  the  representation  capability 
of  action  languages  and  allow  more  general  knowledge  reasoning.  For  exam¬ 
ple,  language  BC  supports  the  representation  of  defaults  and  non-monotonic 
reasoning  [23].  Instead  of  explicitly  searching  the  state  space,  action  lan¬ 
guage  solving  can  translate  to  Answer  Set  Programming  [24],  a  language 
with  wide  applications  in  knowledge  representation  and  reasoning,  and  fast 
solvers  powered  by  satisfiability  checking  algorithms.  Task  planning  prob¬ 
lems  for  mobile  robots  can  be  described  in  action  languages,  such  as  PDDL 
and  BC,  and  then  solved  by  general  purpose  solvers  for  the  specific  lan¬ 
guage  [30]  [19]. 
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(b)  System  architecture  of  the  BWI  robot. 


3  Single-robot  Planning  System 

This  section  explains  how  the  BWI  robots  plan  individually  to  accomplish 
navigation  tasks.  The  system  is  the  outcome  of  ongoing  efforts  in  the  BWI 
lab  to  provide  a  reliable  research  platform.  The  ASP  formalization  pre¬ 
sented  in  Section  3.2.1  is  adapted  from  previous  work  by  Piyush  Khandel- 
wal,  Fangkai  Yang  and  Matteo  Leonetti  [39].  The  empirical  comparison  of 
planners  is  joint  work  with  Shiqi  Zhang,  Piyush  Khandelwal,  in  addition  to 
the  thesis  advisor,  Peter  Stone. 

3.1  System  Architecture 

As  shown  in  Figure  la,  the  hardware  architecture  is  built  upon  Segway 
Robotic  Mobility  Platform.  Their  sensors  include  Hokuyo  URG-04LX  laser 
range-finders,  and  Kinect  RGB-D  cameras  [14].  Figure  lb  shows  the  soft¬ 
ware  system  of  the  BWI  robots  which  uses  the  Robot  Operating  System 
(ROS)  [29].  The  system  integrates  a  symbolic  task  planner  to  make  high- 
level  decisions.  When  a  task  is  assigned  to  a  robot,  the  action  executor 
first  formalizes  the  goal,  and  sends  a  planning  request  to  the  planner.  The 
planner  computes  a  sequence  of  actions  based  on  action  rules  and  current 
knowledge  of  the  domain.  The  action  executor  then  carries  out  each  action 
by  invoking  the  corresponding  modules  in  Figure  lb  according  to  the  action 
type.  Navigation  actions  are  handled  by  the  logical  navigator.  It  trans¬ 
lates  symbolic  locations  to  physical  coordinates,  and  sends  the  coordinates 
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to  the  navigation  stack  which  ultimately  gives  velocity  commands  to  the 
base  controller.  The  logical  navigator  is  also  responsible  for  constructing 
the  symbolic  representation  of  sensor  data,  and  returning  new  observations 
when  each  navigation  action  finishes.  Other  types  of  actions  may  involve 
different  modules.  For  example,  if  the  action  is  to  ask  if  a  person  is  in  the 
current  room,  the  robot  may  speak  through  the  ROS  sound_play  node,  and 
wait  for  user  input  from  a  dialog  interface.  Furthermore,  observations  do 
not  necessarily  come  from  navigation  actions.  Various  sensor  data  can  be 
parsed  as  input  to  the  reasoner.  Hypothetically,  if  we  have  a  person  detec¬ 
tor  running  at  all  times,  it  may  update  locations  of  people  to  the  knowledge 
base. 

3.2  Choice  of  Planner 

The  purpose  of  this  section  is  to  investigate  how  planners  perform  in  a  robot 
navigation  domain  while  varying  features  of  planning  requests,  specifically 
the  number  of  objects  and  the  length  of  the  optimal  plan.  Unlike  classi¬ 
cal  planning  domains  such  as  Blocks  World,  realistic  planning  problems  of 
robots  often  involve  reasoning  about  many  objects  as  well  as  state  variables, 
and  indirect  or  recursive  action  effects.  We  capture  these  features  in  a  test 
domain  derived  from  the  environment  of  the  BWI  robots,  and  compare  the 
performance  of  two  state-of-the-art  ASP  and  PDDL  solvers. 

3.2.1  Domain  Description 

The  domain  resembles  an  office  environment  which  consists  of  rooms  that 
are  common  areas  or  cubicles,  and  offices  that  are  connected  to  each  other  via 
doors.  Figure  2  shows  an  example  floor  plan  of  the  described  environment. 
A  mobile  robot  can  autonomously  navigate  between  locations  unless  a  closed 
door  is  on  the  path.  Other  navigation  actions  include  approaching  a  door, 
and  opening  a  door.  The  exact  action  the  robot  performs  to  open  a  door  is 
domain  dependent,  either  by  using  a  robotic  arm  or  asking  for  help. 

The  domain  knowledge  is  formalized  with  predicates  in  ASP  and  PDDL. 
At  a  particular  timestep  of  a  plan,  the  value  of  every  predicate  together 
describes  the  full  state  of  the  domain.  To  make  a  fair  comparison,  we  use 
the  same  set  of  predicates  for  both  languages.  Time-dependent  predicates 
in  ASP  need  an  additional  parameter  for  the  timestep.  Furthermore,  each 
action  is  allowed  to  execute  under  the  same  condition,  and  makes  the  same 
changes  to  each  allowed  state,  regardless  of  the  language.  Table  1  lists  all 
the  non-action  predicates  in  this  domain. 
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Figure  2:  Small  office  domain. 


A  predicate  is  marked  as  static  if  its  value  does  not  change  after  initial¬ 
ization,  otherwise  the  value  of  the  predicate  depends  on  the  timestep.  In 
ASP,  the  timestep  is  modeled  using  the  integer  variable  n,  and  all  dynamic 
predicates  require  this  parameter.  An  inertial  predicate  inherits  the  value 
from  the  previous  timestep  unless  it  is  changed  explicitly.  All  PDDL  pred¬ 
icates  are  inertial  by  default.  In  ASP,  for  example,  the  inertial  property  of 
at  is  enforced  by  the  following  rule: 

at(R,n)  :-  at(R,n-l),  not  -at(R,n). 

The  predicate  canopen  describes  the  state  after  a  robot  moves  into  a 
position  to  sense  and  open  a  door.  Specifically,  the  robot  has  to  be  facing 
a  closed  door  before  opening  the  door.  The  predicate  is  not  inertial.  In  the 
PDDL  description,  (canopen  d)  is  explicitly  set  to  false  as  an  effect  of  every 
action  except  for  (approach  d). 

The  domain  also  has  a  recursive  predicate  accessible  which  describes 
the  accessibility  of  rooms.  Since  one  corridor  may  connect  to  multiple  open 
rooms,  moving  between  any  of  them  is  a  single  point-to-point  navigation 
action  without  going  through  doors.  The  recursive  property  can  be  described 
as,  if  R3  is  accessible  from  both  R1  and  R2,  then  R2  is  accessible  from  Rl.  The 
complete  and  formal  definition  of  the  predicate  accessible  in  ASP  is  as 
follows: 
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ASP 

PDDL 

Description 

Property 

hasdoor (R,D) 

(hasdoor  ?r  - 
room  ?d  -  door) 

Room  R  has  door  D. 

static 

connected(Rl ,R2) 

(connected  ?rl  - 
room  ?r2  -  room) 

Room  R1  is  connected  to 
room  R2  without  a  door. 

static 

at (R,n) 

(at  ?r  -  room) 

The  robot  is  in  room  R  at 
timestep  n. 

inertial 

visited(R,n) 

(visited  ?r  - 
room) 

The  robot  has  already 
visited  the  room  R  at 
timestep  n. 

inertial 

open(D ,n) 

(open  ?d  -  door) 

Door  D  is  open  at  timestep 

n. 

inertial 

canopen(D,n) 

(canopen  ?d  - 
door) 

The  robot  is  in  front  of 
door  D  at  timestep  n. 

dynamic 

accessible (R1 ,R2 ,n) 

(accessible  ?rl 

-  room  ?r2  - 
room) 

The  robot  can  navigate 
from  room  R1  to  room  R2 
with  a  single  navigation 
action. 

recursive 

Table  1:  Predicates  of  the  office  domain. 


accessible (R1 ,R2,n)  connected(Rl ,R2) . 

accessible (R1 ,R2,n)  open(D,n),  hasdoor (R1 ,D) ,  hasdoor (R2 ,D) . 
accessible (R1  ,R2,n)  accessible (R2 ,R1 ,n) . 

accessible (R1 ,R2,n)  accessible(Rl ,R3,n) ,  accessible(R2,R3,n) . 


The  equivalent  definition  of  accessible  in  PDDL  is  supported  as  a  de¬ 
rived  predicate: 


(: derived  (accessible  ?rl  -  room  ?r2  -  room) 

(or  (connected  ?rl  ?r2) 

(exists  (?d  -  door)  (and  (open  ?d) 

(hasdoor  ?rl  ?d) 

(hasdoor  ?r2  ?d))) 

(accessible  ?r2  ?rl) 

(exists  (?r3  -  room)  (and  (accessible  ?rl  ?r3) 

(accessible  ?r2  ?r3))))) 

There  are  three  actions  in  the  domain:  go_to,  approach,  and  open_door, 
with  the  following  definitions: 

•  go_to:  The  action  has  one  parameter:  a  room  R2.  During  execution, 
the  robot  follows  a  path  computed  by  a  lower  level  controller  to  the 
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location  of  R2.  The  precondition  is,  if  the  current  location  is  Rl,  the 
goal  location  R2  must  be  accessible  from  Rl  with  a  single  navigation 
action.  Once  the  robot  finishes  the  action,  it  is  no  longer  in  Rl,  and  R2 
has  been  visited.  The  precondition  and  effects  are  expressed  as  follows 
in  ASP: 


goto(R2,n),  at(Rl,n-l),  not  accessible(Rl ,R2,n-l) . 
at(R2,n)  goto(R2,n). 

-at(Rl,n)  goto(R2,n),  at(Rl,n-l),  Rl  !=  R2. 

visited(R2 ,n)  goto(R2,n). 

The  equivalent  definition  in  PDDL  is  as  follows: 

(: action  goto 

: parameters  (?r2  -  room) 
precondition  (exists  (?rl  -  room) 

(and  (at  ?rl) 

(accessible  ?rl  ?r2))) 

: effect  (and  (at  ?r2) 

(forall  (?rl  -  room) 

(when  (at  ?rl) 

(not  (at  ?rl)))) 

(visited  ?r2) 

(forall  (?dl  -  door) 

(not  (canopen  ?dl))))) 

approach:  The  action  allows  the  robot  to  move  in  front  of  and  face  a 
door  D,  so  the  robot  can  sense  if  the  door  is  open.  The  robot  has  to  be 
in  a  room  that  has  the  door  D  before  executing  approach  D.  The  rules 
are  represented  in  ASP  as: 

approach (D ,n) ,  at(Rl,n-l),  not  hasdoor (Rl ,D) . 
canopen(D,n)  approach (D ,n) . 

The  complete  PDDL  definition  of  the  action  is  as  follows: 

(: action  approach 

parameters  (?d  -  door) 
precondition  (exists  (?rl  -  room) 

(and  (at  ?rl) 

(hasdoor  ?rl  ?d) ) ) 

: effect  (and  (canopen  ?d) 

(forall  (?dl  -  door) 

(when  (not  (=  ?dl  ?d)) 

(not  (canopen  ?dl)))))) 


•  open-door:  This  action  specifies  that  the  robot  opens  a  door  D.  The 
precondition  is  that  the  robot  is  in  a  pose  that  allows  it  to  perform  the 
action.  The  effect  is  that  the  door  D  is  open.  The  implementation  of 
this  action  depends  on  robot  capabilities.  The  ASP  encoding  of  this 
action  is  as  follows: 

opendoor(D,n) ,  not  canopen(D ,n-l) . 
open(D,n)  opendoor(D,n) . 


Similarly,  the  action  is  defined  in  PDDL  as: 

(: action  opendoor 

: parameters  (?d  -  door) 

: precondition  (canopen  ?d) 

:  effect  (and  (open  ?d) 

(forall  (?dl  -  door) 

(not  (canopen  ?dl))))) 


3.2.2  Experiments 

The  experiments  evaluate  the  performance  of  two  solvers  with  various  con¬ 
figurations  of  robot  navigation  problems.  Two  configurations  of  domain 
sizes  are  used  in  the  experiments.  The  small  domain  has  4  offices  connected 
with  doors,  and  6  directly  connected  rooms.  The  large  domain  has  a  total 
of  15  rooms  in  which  6  are  offices  that  have  doors.  In  each  problem,  the 
robot  starts  in  the  corridor,  and  needs  to  visit  a  list  of  randomly  selected 
rooms.  The  plan  length  is  strongly  correlated  but  not  directly  proportional 
to  the  number  of  rooms  visited,  since  entering  rooms  connected  via  doors 
takes  more  steps.  In  order  to  capture  the  underlying  distribution,  we  repeat 
the  trials  50  times  for  each  number,  and  average  the  results. 

Since  International  Planning  Competition  (IPC)  in  2014  does  not  take 
into  account  or  announce  the  solving  time  of  each  planner,  PDDL  plan¬ 
ners  are  designed  to  use  all  the  allowed  time  (1800  seconds).  We  therefore 
select  the  winner  of  IPC  2011,  FastDownward  [16]  with  the  setting  LAMA- 
2011  [31],  which  supports  derived  predicates.  For  the  ASP  solver,  we  use 
the  incremental  mode  of  CLINGO-4.5.4  [12].  CLINGO  is  an  Answer  Set 
solving  system  that  integrates  CLASP,  the  winner  of  the  fifth  Answer  Set 
Programming  Competition  in  2015  [5].  Experiments  are  run  on  a  general- 
purpose  High  Throughput  Computing  (HTC)  cluster  using  only  machines 
with  at  least  8GB  memory. 

Figure  3  plots  average  planning  time  on  the  logarithmic  scale  with  stan¬ 
dard  error.  The  x-axis  shows  the  number  of  rooms  planned  to  visit.  We 
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Figure  3:  Performance  of  planners 


can  observe  that  when  fewer  than  8  rooms  are  visited,  the  ASP  solver  is 
much  faster  than  the  PDDL  planner,  and  finds  the  optimal  plan  in  almost 
real-time.  However,  the  slope  of  the  red  curves  is  smaller,  i.e.  the  PDDL 
planner  eventually  outperforms  the  ASP  solver  at  large  plan  length.  We  can 
also  observe  that  the  ASP  solver  is  less  sensitive  to  the  increase  in  domain 
size,  as  the  gap  between  blue  curves  is  smaller  than  the  gap  between  red 
curves. 

3.2.3  Summary 

Based  on  an  empirical  comparison  between  ASP  and  PDDL  solvers,  we  con¬ 
clude  that  ASP-based  planning  is  faster  at  solving  robot  navigation  prob¬ 
lems  in  large  domains,  whereas  the  PDDL  planner  is  better  for  problems 
that  require  long  plans.  The  environment  of  the  BWI  robots  involves  many 
rooms,  doors,  objects  and  people.  Any  useful  encoding  of  a  portion  of  the 
CS  building  exceeds  the  size  of  the  test  domains.  Furthermore,  task  plan¬ 
ning  is  at  the  most  abstract  level  of  their  decision  making  procedures,  so 
plans  are  more  likely  to  be  short.  According  to  the  findings  of  the  empirical 
comparison,  we  choose  ASP  for  encoding  the  domain,  and  CLINGO  as  the 
task  planner. 


10 


4  Multi-robot  Planning  Algorithm 


The  second  part  of  the  thesis  presents  a  multi-robot  task  planning  algorithm 
which  plans  for  a  team  of  mobile  robots  in  the  same  space.  In  addition  to 
the  thesis  advisor,  Peter  Stone,  this  research  is  joint  work  with  Shiqi  Zhang 
and  Guni  Sharon  [41]. 

4.1  Problem  Statement 

The  central  planner  takes  input  from  each  robot  of  a  planning  problem  that 
consists  of  the  following  information: 

•  goal  state 

•  current  state 

•  domain  description,  which  includes: 

—  definition  of  predicates  and  actions,  such  as  the  rules  introduced 
in  Section  3. 

—  static  objects  and  relations. 

—  a  distribution  for  the  duration  of  each  allowed  action. 

—  a  cost  function  that  assigns  a  cost  to  each  allowed  action.  For 
navigation  actions,  the  cost  is  often  defined  by  the  expected  du¬ 
ration. 

The  output  of  the  central  planner  is  a  set  of  plans  in  which  each  plan  is  a 
sequence  of  actions  for  a  robot.  The  goal  of  this  research  is  to  find  the  set 
of  plans  which  minimizes  the  expected  total  costs.  We  make  the  following 
assumptions: 

•  Each  robot  works  on  its  own  task.  Tasks  are  not  transferable. 

•  Robots  have  real-time  communications  with  the  central  controller. 

•  Interactions  among  robots  happen  in  two  cases.  First,  there  is  a  con¬ 
flict  when  two  robots  are  going  in  the  opposite  direction  in  a  narrow 
corridor.  Second,  robots  can  take  advantage  of  the  opendoor  action  of 
another  robot  by  following  it  through.  Note  that  the  plan  can  have 
non-navigation  actions,  but  interactions  only  occur  in  navigation. 
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4.2  High-level  Description  of  Our  Solution 

The  problem  of  finding  the  optimal  plans  for  multiple  mobile  robots  is  NP- 
hard  [36].  The  robots  will  not  be  able  to  interact  in  real-time  if  the  symbolic 
planner  has  to  solve  for  a  formalization  which  encodes  all  robots  at  once. 
In  order  to  approach  the  optimal  solution  in  a  reasonable  amount  of  time, 
we  first  solve  a  simplified  version  of  the  problem:  since  the  central  planner 
knows  the  planned  actions  of  all  robots  in  the  team,  when  a  robot  receives 
a  goal  and  joins  the  collective,  it  can  optimize  the  new  robot’s  behavior 
accordingly.  Therefore,  the  first  part  of  the  algorithm  focuses  on  how  the 
knowledge  of  external  actions  can  be  incorporated  into  the  planning  process. 
Section  4.3  shows  our  approach  to  adjust  the  cost  function  of  actions  to 
encode  the  predicted  effects  of  other  robots’  actions.  An  optimal  symbolic 
planner  can  then  compute  the  optimal  plan  based  on  the  new  cost  function. 
We  call  this  process  conditional  planning ,  and  the  optimal  plan  under  the 
knowledge  of  the  plans  of  other  robots  the  conditional  optimal  plan.  Section 
4.4  proposes  iterative  conditional  planning  (ICP)  algorithms  that  iteratively 
improves  the  plan  quality  with  conditional  planning. 

4.3  Conditional  Plan  Costs 

In  this  section,  we  investigate  how  conflicting  navigation  actions  and  sharing 
door  actions  affect  the  expected  cost  of  a  plan.  The  consequence  of  a  head  on 
confrontation  of  two  robots  in  a  narrow  corridor  is  modeled  by  a  large  con¬ 
stant  collision  cost  ji.  Since  temporal  uncertainties  propagate  from  previous 
delays  in  navigation  actions,  collisions  do  not  happen  deterministically.  We 
derive  the  expected  collision  cost  Ccomsion  for  any  pair  of  conflicting  actions 
that  navigate  the  same  corridor  in  opposite  directions. 

Let  Tf ,  Tf  be  the  start  and  completion  time  of  the  action  being  planned, 
and  Tf  >  Tf  be  the  start  and  completion  time  of  the  external  action.  Let  their 
probability  density  functions  (PDF)  be  fr* (i),  fr^(t),  and  /r2c(i). 

The  PDFs  are  propagated  from  distributions  of  previous  action  durations  in 
the  plan  (as  explained  in  Section  4.5.1).  Since  the  collision  will  not  happen 
if  one  robot  enters  the  corridor  after  the  other  robot  leaves,  the  collision  cost 
is  defined  by  the  following  piecewise  function  of  the  four  random  variables: 


cost  (collision  |  Tf,  Tf,  Tf ,  T2C) 


0  if  Tf  >  Tf  or  Tf  >  Tf, 
/x  otherwise. 


(1) 


C collision  is  equal  to  the  expected  value  of  the  cost  function: 
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(-'collision  =  E[cosf  (colhsion  |  Tf ,  Tf ,  T2C)] 

=  n  •  P( collision  |  Tf,  Tf,  Tf ,  Tf)  (2) 

=  n  •  (1  -  P(Tf  >  T2c)  -  P(Tf  >  Tf)) 

Since  Tf  is  independent  of  Tf,  and  Tf  is  independent  of  Tf,  it  is  easy 
to  calculate  the  joint  probabilities  by  integrating  the  product  of  individ¬ 
ual  PDFs.  Therefore,  CcoiuSion  of  the  action  pair  is  equal  to  the  following 
expression: 


T  '  (1  —  /  /  fT{(tl)fTc(t2)dtidt2—  /  /  fTc{tl)fT°{t2)dt2dti) 

Jo  Jt2  Jo  Jti 

(3) 

We  then  consider  the  behavior  of  sharing  doors.  The  action  wait_f  or_open 
allows  one  robot  to  wait  next  to  a  door  which  another  robot  is  trying  to 
open.  When  the  other  robot  completes  open-door,  the  waiting  robot  will 
move  closer  to  the  door  and  starts  going  through  as  soon  as  the  doorway  is 
clear.  The  action  is  defined  by  the  following  rules  in  ASP: 

waitf oropen(D ,n) ,  at(R,n-l),  not  hasdoor (R,D) . 
waitf  oropen(D ,n)  ,  (Kexopendoor  (R,D,  I)}-0  . 
open(D,n)  waitf oropen(D ,n) . 

facing(D,n)  waitf oropen(D ,n) . 

In  short,  wait_for_open  is  allowed  only  when  there  is  an  external  action 
that  opens  the  door,  and  the  robot  is  in  an  area  that  connects  to  the  door. 
When  the  action  finishes,  the  door  is  open,  and  the  robot  is  facing  the  door. 
The  timing  of  the  two  actions  is  again  very  important.  The  robot  should 
only  plan  to  wait  if  the  other  robot  can  get  a  head  start.  As  some  doors 
require  holding  to  remain  open,  the  waiting  robot  also  needs  to  arrive  before 
the  first  robot  finishes  open-door.  Failure  to  arrive  in  the  time  window  will 
force  the  robot  to  open  the  door  by  itself.  We  represent  the  penalty  of  such 
failure  by  oj,  which  is  often  the  same  as  the  cost  of  opening  the  door  again. 
Suppose  Tf  is  the  start  time  of  the  robot  action  wait_f  or_open,  Tf  and  Tf 
are  the  start  and  completion  time  of  the  external  open-door  action.  The  cost 
function  of  wait_for_open  is  defined  as: 


cost  (wait  |  Tf,  Tf ,  Tf ) 


u 

Tf  -  Tf 


if  Tf  <  Tf  or  Tf  >  Tf, 
if  Tf  <  Tf  <  Tf. 


(4) 
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The  expected  cost  of  waiting  Cwait  is  calculated  as  follows: 


Cwait  =  E[cosf(wait  |  Tf ,  Tf ,  T2C)] 

=  w  •  P(failure  |  Tf,  Tf,  Tf )  ^ 

/*00  /*00  /»t3 

+  /  /  /  &  —  ti)fT°(tl)fT$,T$(t2,t3)dtidt2dt3 

Jo  Jo  Jt2 

where  /t|,T|  (^2j  £3)  is  the  joint  density  function  of  when  the  other  robot 
starts  and  finishes  open_door.  If  the  cost  of  door  opening  is  modeled  as  a 
constant  without  uncertainty,  Tf  can  be  represented  by  Tf  to  eliminate  one 
variable  in  Equation  5. 

P  ( failure  |  Tf,  Tf,  Tf)  is  the  probability  of  arriving  outside  of  the  valid 
time  window,  given  by 


P (failure  |  Tf ,  Tf ,  Tf ) 


P (Tf  <  Tf )  +  P (Tf  >  Tf) 

fOC  ft2 

/tis(£i)/t|(£2)  dtidt-2 


10  JO 


+ 


> 0  Ji3 


fr?  (ti)  frs (^3)  dtidt3 


In  practice,  the  planner  can  approximate  Cwait  by 


(6) 


Cw  ps  E[Tf  -  Tf]  =  E[Tf]  -  E[Tf]  (7) 

while  using  a  threshold  on  P(failure  |  Tf ,  Tf ,  Tf)  to  account  for  temporal 
uncertainties. 

Algorithm  1  presents  a  function  Cp(p,  A)  that  computes  the  conditional 
expected  cost  of  a  plan  p,  given  a  set  A  of  navigation  actions  planned  by 
other  robots.  We  can  then  compute  the  conditional  optimal  plan  by  calling 
an  optimal  planner  to  minimize  Cp(p,A).  Note  that  the  planner  can  be 
chosen  according  to  features  of  the  planning  problem,  as  Section  3.2  shows. 

Note  that  the  plan  might  have  other  types  of  actions,  and  the  algorithm 
is  generalizable  to  other  kinds  of  interactions.  The  collision  cost  can  be  used 
wherever  a  pair  of  actions  are  not  allowed  to  overlap  in  time  due  to  limited 
resources.  We  can  also  adapt  the  door  sharing  calculations  to  cases  in  which 
actions  across  different  robots  have  to  happen  in  a  certain  order. 
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Algorithm  1  Computing  conditional  plan  cost 
Input:  Plan  p.  whose  cost  will  be  evaluated 

Input:  Set  of  external  actions  A,  on  which  the  cost  of  p  is  conditioned 
Output:  Cp :  overall  expected  cost  of  plan  p 
1:  Cp±-  0 

2:  for  each  a\  €  p  do 

3:  if  a\  is  not  wait_f or_open  then 

4:  Cp  i —  Cp  cost(fli) 

5:  end  if 

6:  for  each  a2  £  A  do 

7:  if  ci\  is  wait_f or_open(D)  and  02  is  open_door(D)  then 

8:  Cp  •<—  Cp  +  Cwait(ai,  a2) 

9:  else  if  <21  and  a-i  are  conflicting  navigation  actions  then 

10:  Cp  i  Cp  -\-  C collision  (ai,a2) 

11:  end  if 

12:  end  for 

13:  end  for 
14:  return  Cp 


4.4  Iterative  Conditional  Planning 

Our  next  step  is  to  apply  the  conditional  planning  method  of  each  robot 
to  approach  the  global  optimum  of  the  system.  We  present  two  iterative 
conditional  planning  (ICP)  algorithms:  Simple  ICP  and  Enhanced  ICP. 

4.4.1  The  Simple  ICP  Algorithm 

Algorithm  2  shows  a  basic  algorithm  which  iterates  through  every  robot, 
and  computes  the  optimal  plan  conditioned  on  the  plans  of  previous  robots. 
We  call  this  algorithm  Simple  Iterative  Conditional  Planning  (S-ICP). 


Algorithm  2  S-ICP  algorithm 

Input:  N :  number  of  robots 
Input:  Planning  problem  for  each  robot 
Output:  pi , P2 ,  *  *  *  ,pN 
1:  Initialize  an  empty  set  of  external  actions  A 
2:  for  each  j  €  {1,  2, . . . ,  Nj  do 

3:  pj  <r-  the  conditional  optimal  plan  under  the  objective  function  Cp(pj,A) 

4:  A  <—  A  U  {navigation  actions  in  pjj 

5:  end  for 

6:  return  pi,p2,  •  •  •  ,Pn 
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We  can  observe  that,  once  a  robot  selects  its  plan,  it  expands  the  set 
of  external  actions  for  all  previous  robots.  Under  the  new  condition,  their 
chosen  plans  can  be  sub-optimal.  Thus,  the  quality  of  the  output  depends 
on  the  ordering  of  robots.  One  way  to  improve  the  optimality  of  S-ICP  is  to 
try  all  orderings,  and  select  the  plans  with  the  lowest  expected  total  costs. 
We  call  this  modified  version  Brute-force  S-ICP. 

4.4.2  The  Enhanced  ICP  Algorithm 

We  propose  a  novel  algorithm  called  Enhanced  Iterative  Conditional  Plan¬ 
ning  (E-ICP)  which  repeats  the  conditional  planning  iterations  to  further 
reduce  total  costs  of  the  plans.  The  algorithm  is  inspired  by  simulated 
annealing,  a  well-known  iterative  algorithm  for  global  optimization.  The 
concept  of  “temperature”  in  simulated  annealing  is  integrated  as  the  level 
of  importance  of  other  robots,  or  negotiation  depth.  The  central  planner 
initially  accepts  worse  solutions  that  do  not  fully  adapt  to  plans  of  other 
robots,  and  moves  towards  complete  collaboration.  We  implement  the  ne¬ 
gotiation  depth  as  a  value  a  that  increases  from  0  to  1,  where  0  means  the 
plan  of  each  robot  is  computed  independently.  At  a  =  1,  the  plan  of  each 
robot  is  optimized  by  the  full  conditional  cost  (the  output  of  Algorithm  1). 

The  increase  of  the  importance  of  other  robots  is  achieved  by  adjusting 
their  effect  on  the  objective  function.  In  the  case  of  conflicting  navigation 
actions,  the  algorithm  is  increasingly  concerned  about  the  conflict  when  the 
negotiation  depth  increases.  Therefore,  we  multiply  the  collision  cost  in 
Equation  5  by  a.  The  same  reasoning  suggests  that  the  collaboration  on 
opening  doors  should  be  less  favorable  in  earlier  iterations,  with  the  effect  of 
external  open_door  actions  completely  neglected  in  the  first  iteration.  This 
is  implemented  by  elevating  the  cost  of  wait_f or_open  towards  the  cost  of 
opening  the  door  again.  The  central  planner  assumes  that  the  collaboration 
will  always  fail  when  a  =  0,  and  as  a  increases  to  1,  the  probability  of  failure 
decreases  linearly  from  1  to  the  value  calculated  by  Equation  6.  Algorithm  3a 
shows  the  adjusted  cost  function  which  incorporates  a  in  the  cost  of  collision 
and  the  cost  of  collaboration  failure. 

Algorithm  3b  presents  the  E-ICP  algorithm.  We  introduce  an  outer  loop 
which  iterates  from  0  to  a  parameter  0.  The  negotiation  depth  a  starts  at  0 
and  increases  by  a  step  size  of  g  until  it  reaches  1  in  the  final  iteration.  The 
inner  loop  iterates  through  each  robot,  computes  the  conditional  optimal 
plan ,  and  maintains  the  set  of  navigation  actions  currently  planned  by  other 
robots.  When  N  >  2,  the  algorithm  has  the  choice  of  conditioning  the 
plan  on  a  subset  of  the  team.  We  introduce  an  additional  parameter  M  to 


16 


represent  the  size  of  the  subset.  Line  6  requires  an  optimal  symbolic  task 
planner  to  minimize  the  objective  function  given  by  Algorithm  3a. 


Algorithm  3a  Computing  conditional  plan  cost  in  E-ICP 
Input:  Plan  p.  whose  cost  will  be  evaluated 

Input:  Set  of  external  actions  A ,  on  which  the  cost  of  p  is  conditioned 
Input:  a:  negotiation  depth 

Input:  ui  :  penalty  of  failure  (often  the  same  as  the  cost  of  opening  the  door  again) 
Output:  Cp :  overall  cost  of  plan  p 
1:  Cp  <r-  0 

2:  for  each  oi  €  p  do 

3:  if  a\  is  not  wait_f or_open  then 

4:  Cp  4 —  Cp  cost(cq) 

5:  end  if 

6:  for  each  <22  £  A  do 

7:  if  ai  is  wait_f or_open(D)  and  02  is  open_door(D)  then 

8:  Cp  4—  Cp  +  Cwait(ai,  <12)  +  (1  —  Oi)  •  U)  •  Pfail(aii  0,2) 

9:  else  if  ai  and  02  are  conflicting  navigation  actions  then 

10:  Cp  i  Cp  C  Oi  •  C collision  {a1,a2) 

11:  end  if 

12:  end  for 

13:  end  for 
14:  return  Cp 


Algorithm  3b  E-ICP  algorithm 

Input:  N :  number  of  robots 

Input:  Planning  problem  for  each  robot 

Input:  0:  number  of  rounds  of  “negotiations”,  0  >  0 

Input:  M:  number  of  robots  considered  in  conditional  planning,  M  <  N  —  1 
Output:  pi , P2 ,  *  *  *  ,pN 
1:  Initialize  an  empty  set  of  external  actions  A 
2:  for  each  i  £  {0, 1, . . . ,  0}  do 
3:  o  =  i/0 

4:  for  each  j  £  {1,2,...,  N}  do 

5:  A  4—  {navigation  actions  in  M  other  robots} 

6:  Pj  ■£-  the  conditional  optimal  plan  under  objective  function  Cp(pj,  A,  a) 

7:  end  for 

8:  end  for 

9:  return  pi,p2,  •  •  •  ,Pn 
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4.4.3  Complexity  and  Optimality  of  ICP 

The  Simple  Iterative  Conditional  Planning  algorithm  calls  the  symbolic 
planner  N  times.  There  are  N\  possible  orderings  of  the  N  robots.  There¬ 
fore,  the  time  complexity  of  Brute-force  S-ICP  is  0(N\  ■  N )  with  respect 
to  a  single  operation  of  conditional  planning.  Since  Enhanced  ICP  has  0 
rounds  of  conditional  planning,  its  complexity  is  0(Q-N).  As  the  number  of 
robots  in  the  team  grows,  E-ICP  becomes  a  much  more  efficient  algorithm 
than  Brute-force  S-ICP. 

Neither  Brute-force  S-ICP  nor  E-ICP  always  outputs  the  optimal  solu¬ 
tion.  Further,  both  algorithms  can  be  outperformed  by  the  other.  E-ICP 
often  finds  better  plans  if  several  rounds  of  negotiations  are  required  to  reach 
the  global  optimum.  In  such  problems,  since  S-ICP  only  plans  once  for  each 
robot,  every  ordering  generates  suboptimal  plans.  In  general,  ICP  algo¬ 
rithms  are  suboptimal  because  robots  are  “selfish”  in  conditional  planning 
-  each  robot  receives  the  best  plan  for  itself  under  information  about  the 
others.  E-ICP  can  get  stuck  at  a  local  optimum  when  one  robot  has  a  plan 
that  can  lower  the  cost  of  the  system,  but  has  no  incentive  to  switch  to  it. 
S-ICP  (with  best  ordering)  can  sometimes  correct  this  problem  by  forcing 
that  “selfish”  robot  to  plan  last. 

4.5  Evaluation 

In  this  section,  we  evaluate  the  proposed  E-ICP  algorithm  in  the  office  do¬ 
main  of  the  BWI  robots.  To  fully  instantiate  the  planning  problem  stated 
in  Section  4.1,  we  first  assign  each  navigation  action  a  model  of  its  dura¬ 
tion.  Quantitative  experiments  are  conducted  with  a  realistic  multi-robot 
simulator  GAZEBO  [22]  and  an  abstract  simulator.  We  also  demonstrate 
the  successful  implementation  of  the  algorithm  in  a  trial  with  real  robots. 

4.5.1  Navigation  Duration 

To  model  the  noisy  duration  of  navigation  actions  in  the  test  domain,  we 
make  two  additional  assumptions: 

1.  Navigation  actions  are  delayed  by  independent  appearances  of  humans 
at  a  constant  rate  A. 

2.  Each  person  needs  a  fixed  time  <5  to  pass  the  blocking  zone.  In  other 
words,  each  appearance  delays  the  action  by  6. 
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Following  the  first  assumption,  the  number  of  delays  per  unit  time  can  be 
modeled  by  a  Poisson  distribution  with  parameter  A. 

Proposition:  If  X  and  Y  are  independent  Poisson  random  variables: 
Poisson(Xi)  and  Y~Poisson( A2),  then  their  sum  Z  =  X+Y  also  follows  a 
Poisson  distribution:  Z~Poisson( A1+A2)  [21]. 

Suppose  the  navigation  action  a  takes  tact{a)  units  of  time  at  full  speed 
without  delays.  Following  the  proposition,  we  can  model  the  number  of 
delays  N delay («)  by  a  Poisson  distribution  with  parameter  A  •  tact{a).  In 
other  words,  the  action  duration  t(a)  can  be  modeled  by 

^(u)  —  t acti} “1“  3  '  X delayij^)  ($) 

where  Xdeiay(a)~ Poisson(X  ■  tact{a )). 

Therefore,  the  model  of  navigation  duration  requires  two  domain  param¬ 
eters  A  and  5,  and  the  acting  time  of  each  action  tact . 

Suppose  a  plan  has  the  sequence  of  actions  ai,  02,  ■  ■  ■  ,  an.  For  any  action 
ak,  we  can  derive  the  distribution  of  its  start  time  from  the  parameters 
of  previous  actions  ai,a2,  •  •  •  ,  afe-i-  The  distribution  is  required  by  Equa¬ 
tions  3  and  5  to  compute  the  conditional  plan  cost. 


k- 1 


k- 1 


T%  =  tactfai)  +  ^  ‘  Ndelay{ai)  (9) 

2=0  2=0 

k~  1 

In  general,  the  PDF  of  the  sum  Ndeiay(ai)  is  the  convolution  of  each 


i= o 


k- 1 


PDF  [28].  As  Ndeiay(a)~  Poisson(X  ■  tact(a)),  the  sum  £  X delay  ulso 


i= 0 


follows  a  Poisson  distribution  with  parameter  A  •  Yli=o  tact.(ai)-  Let  t'act  = 
J2i=o  ^act(fli) ■  Plugging  in  the  probability  mass  function  (PMF)  of  Poisson 
distribution,  we  have 


P(Ei=0  N delay  (®i)  =  x)  =  (Yfact)xe 


-P-Ct) 


'/*! 


(10) 


where  ]Y;=o  Xdeiay{&i )  =  Tk~t'act/8.  The  distribution  of  T|  is  defined  by: 

0  if  t<  C.t, 


nn  =  t)  =  < 


(A< .ctY^-tact/s)  '  exp(-\-t'act)  -r  .  ./ 

(*-t'act/sy.  >  tact ’ 


(11) 
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4.5.2  Gazebo  Simulation 

We  use  the  GAZEBO  simulator  to  compare  the  performance  of  E-ICP  with 
2  rounds  of  conditional  planning  (0  =  2)  against  the  baseline  where  plans 
are  generated  independently.  Figure  4  (left)  shows  a  map  of  the  testing 
environment.  S±  and  S2  (green  rectangles)  are  the  initial  positions  of  two 
robots,  and  G \  and  G2  (red  ellipses)  show  their  goal  positions  respectively. 
The  two  robots  are  likely  to  collide  without  collaboration  in  planning.  In 
this  case,  both  S-ICP  and  E-ICP  compute  the  globally  optimal  plan:  one  of 
the  robots  changes  the  direction  of  its  path,  which  is  longer  in  distance  but 
no  longer  causes  conflicts. 

Artificial  human  walkers  are  introduced  in  the  environment  with  behav¬ 
ior  stated  in  Section  4.5.1,  simulating  temporal  uncertainties  in  navigation 
actions.  Figure  4  (right)  shows  a  human  walker  blocking  the  path  of  a  robot 
in  GAZEBO. 

Figure  5  plots  the  actual  plan  execution  time  of  each  robot  in  45  trials. 
Table  2  reports  the  average  and  standard  deviation  of  the  results.  We  can 
observe  that  E-ICP  reduces  both  the  overall  cost  and  the  variance  compared 
to  the  baseline.  When  each  robot  plans  independently,  the  execution  time 
has  a  wider  spread  because  the  consequence  of  the  collision  depends  on 
the  exact  position  of  the  collision,  and  which  robot  turns  around  first.  For 
example,  the  blue  squares  on  the  left  happen  when  R2  goes  around  after  the 
collision,  while  R1  keeps  going  on  the  shortest  path. 

4.5.3  Abstract  Simulation 

The  physics  engine  and  visualization  of  GAZEBO  help  verify  that  E-ICP  is  a 
realistic  and  significant  improvement  from  the  baseline.  To  run  a  larger  scale 
of  experiments,  we  design  an  abstract  simulator  that,  instead  of  simulating 
navigation  and  human  walkers,  samples  navigation  costs  directly  from  our 
model  in  Section  4.5.1.  Figure  6  is  a  map  of  the  simulation  domain.  The 
cost  of  opening  each  door  co  is  12,  and  the  collision  cost  //  is  40.  We  use 
the  abstract  simulation  environment  to  evaluate  three  aspects  of  E-ICP:  the 
effect  of  negotiation  rounds  (0),  the  necessity  of  modeling  uncertainties  in 
navigation  durations,  and  the  significance  of  the  subset  size  (M)  in  a  team 
larger  than  2  robots.  There  is  no  direct  comparison  between  Brute-foce  S- 
ICP  and  E-ICP  using  this  simulator,  since  results  from  one  domain  will  not 
be  representative  of  their  average  performances. 
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Figure  4:  GAZEBO  simulation  environment  (and  a  picture  of  a  human 
walker  blocking  a  robot). 
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Figure  5:  Costs  of  robots  i?l  and  R2  in  45  trials  collected  using  GAZEBO 
simulation  environment. 
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Table  2:  Average  time  and  standard  deviation  from  GAZEBO  simulation 
experiments. 
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Evaluating  the  effect  of  0  We  compare  the  performance  of  plans  gen¬ 
erated  under  0  =  0,  1,  and  2.  At  0  =  0,  the  algorithm  has  one  cycle  of 
planning  with  a  =  0,  so  the  plans  are  equivalent  to  the  individual  optimal 
plans  without  collaboration.  At  0  =  1,  the  algorithm  performs  one  round  of 
independent  planning,  and  one  round  of  fully  conditional  planning  at  a  =  1, 
a  hybrid  of  no  collaboration  and  S-ICP.  At  0  =  2,  we  have  three  rounds  of 
negotiation  at  a  =  0,  1  and  2. 

Figure  7  reports  the  average  execution  costs  and  the  average  number  of 
successful  door  collaborations  from  50  trials.  In  all  the  trials,  R1  and  R2 
need  to  move  from  <72  to  d9  and  from  <77  to  <74,  respectively.  The  tasks  are 
assigned  so  that  there  is  risk  of  collision  (between  <73  and  <78),  and  potential 
for  collaboration  on  opening  doors.  The  relative  starting  time  is  varied  in 
these  trials.  We  can  observe  that,  when  one  robot  starts  much  earlier  than 
the  other,  the  three  configurations  have  almost  the  same  performance.  In 
cases  where  the  two  robots  receive  the  tasks  around  the  same  time,  0  > 
0  performs  significantly  better  than  0  =  0.  Further,  the  extra  round  of 
negotiation  at  0  =  2  enables  more  door  collaborations  than  E-ICP  with 
0  =  1. 

Evaluating  the  necessity  of  modeling  temporal  uncertainties  In 

the  next  set  of  experiments,  R1  plans  to  go  from  <73  in  corrl  to  <74,  while  R2 
plans  to  move  from  d7  to  d5.  There  is  a  risk  for  R2  to  give  up  its  shortest 
path  and  follow  R1  through  d3.  If  there  is  any  delay  in  the  navigation  from 
<77  to  <73,  R2  will  have  to  open  the  door  again.  Figure  8  shows  that,  modeling 
the  risk  of  failure  in  door  collaborations  (Equation  6)  improves  the  overall 
quality  of  solutions  by  E-ICP. 
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-e-6=0  (No  comm)  -^e=1  (S-ICP)  -^6=2  (E-ICP)H 


Figure  7:  Planning  for  a  two-robot  system  (evaluating  0). 


Figure  8:  Planning  for  a  two-robot  system  (evaluating  the  need  of  modeling 
temporal  uncertainties). 
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Evaluating  the  effect  of  M  Experiments  with  a  team  of  three  robots  are 
conducted  in  the  same  simulation  environment.  The  conditional  planning 
step  can  consider  the  previous  robot  (M  =  1),  or  all  other  robots  (M  =  2). 
We  evaluate  the  effect  of  M  on  the  performance  of  E-ICP  at  0  =  1  and 
0  =  2.  In  these  experiments,  Rl,  R2  and  R3  need  to  move  from  dl  to  d9, 
from  d6  to  d4,  and  from  dlO  to  d8  (cor  1  side),  respectively.  Each  robot  can 
receive  its  task  at  time  0,  15  or  30,  generating  19  distinctive  initial  states. 
Table  3  shows  the  mean  and  standard  deviation  of  the  overall  costs,  based 
on  the  19  states,  and  50  trials  for  each  state.  The  increase  of  M  significantly 
lowers  overall  costs  (p- value  is  0.03  at  0  =  1,  and  0.02  at  0  =  2). 

4.5.4  Robot  Demonstration 

Conducting  quantitative  experiments  with  multiple  real  robots  can  be  very 
costly  (in  terms  of  time  and  possible  damage  of  collisions).  Instead,  we 
present  the  following  illustrative  trial. 

Figure  9  shows  the  initial  (51  and  52)  and  goal  positions  (G1  and  G 2) 
on  the  real  navigation  map  of  the  BWI  robots.  Without  collaboration  in 
task  planning,  the  robots  will  both  have  to  find  someone  to  open  Dl  and 
D2.  The  E-ICP  algorithm  enables  the  second  robot  to  take  advantage  of 
the  door  opened  by  the  first  robot  (Dl).  Since  the  first  robot  is  expected 
to  arrive  at  Dl  early,  it  can  go  around  asking  people  while  the  second  robot 
navigates  from  52  to  Dl.  A  video  of  this  trial  is  available  at:  https:// 
youtu . be/ ADbH3sppLHQ 
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Number  of  teammates  considered  in  conditional  planning 

M  =  1 

M  =  2 

0  =  1 

207.81  (66.24) 

179.28  (9.81) 

0  =  2 

205.35  (62.25) 

171.03  (9.99) 

Table  3:  Mean  and  standard  deviation  values  of  the  four  configurations. 


Figure  9:  Floor  map  of  the  real-world  environment. 


4.6  Summary 

This  section  proposes  the  iterative  conditional  planning  algorithm  to  solve 
the  problem  of  optimal  planning  in  an  environment  with  multiple  mobile 
robots.  The  algorithm  has  two  steps:  1)  adjusting  the  objective  function  in 
single-robot  planning  to  model  influence  of  other  robots,  and  2)  an  iterative 
algorithm  that  uses  step  1  to  approach  global  optimum.  A  distribution  of 
navigation  durations  is  derived  for  the  BWI  robots,  and  used  to  evaluate 
different  configurations  of  the  algorithm. 
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5  Related  Work 


Planner  competition  and  comparison:  Comparisons  of  planning  algo¬ 
rithms  have  been  mostly  provided  by  the  International  Planning  Competi¬ 
tion  [35]  in  the  standardized  language  PDDL  [26].  The  Answer  Set  Pro¬ 
gramming  competition  evaluates  answer  set  solvers  on  a  variety  of  domains, 
in  which  some  of  them  are  planning  problems  [5].  However,  these  compe¬ 
titions  do  not  evaluate  solvers  across  different  action  languages,  or  analyze 
domain  features  that  affect  the  performance  of  different  modeling  and  solv¬ 
ing  strategies.  Some  heuristics  of  SAT-based  planning  have  been  compared 
to  PDDL  planners  [32],  However,  there  has  not  been  a  direct  comparison 
of  ASP-based  planning  and  PDDL-based  planning,  especially  in  a  realistic 
robotics  domain  that  requires  recursive  action  effects. 

Temporal  planning  with  uncertainties:  PDDL  has  been  extended  to 
consider  numeric  properties  including  durations  of  actions  in  PDDL2.1  [11], 
and  temporal  planning  problems  have  been  included  in  International  Plan¬ 
ning  Competitions.  Uncontrollable  action  durations  have  been  expressed 
as  intervals  of  time,  in  order  to  find  “strong”  plans  that  are  guaranteed  to 
work  [6]  [27].  Our  approach  to  planning  with  noisy  action  durations  differ 
from  these  methods  since  1)  we  allow  domain-dependent  probability  density 
functions,  and  2)  we  aim  to  minimize  overall  expected  costs  for  a  team  of 
robots. 

Multi-agent  planning:  Previous  work  on  multi-agent  planning  has  fo¬ 
cused  on  the  representation  of  concurrent  actions  in  PDDL  and  other  for¬ 
malisms  [2]  [1] .  When  combined  with  temporal  planning,  certain  extensions 
of  action  languages  and  solvers  are  capable  of  planning  for  multiple  agents 
while  considering  action  durations  [3]  [7].  However,  the  direct  represen¬ 
tation  of  concurrent  action  effects  significantly  increases  the  complexity  of 
planning  [33].  This  thesis  proposes  an  algorithm  that  efficiently  plans  for  a 
team  of  robots  by  modeling  concurrent  action  effects  as  conditional  costs. 
The  algorithm  floats  above  the  representation  in  action  languages  and  the 
implementation  of  solvers.  Therefore,  we  can  select  a  fast  classical  planner 
based  on  features  of  the  planning  domain,  as  shown  in  the  first  part  of  this 
thesis. 

Applications  of  task  planning  on  multiple  robots:  Symbolic  task 
planning  has  been  applied  to  a  team  of  robots  in  many  domains  [4,  20,  37,  38, 
10].  These  applications  either  assume  no  run-time  conflicts  [4,  20],  or  focus 
on  run-time  negotiation  [37,  38,  10],  as  opposed  to  predicting  interactions 
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at  planning  time. 

Multi-robot  probabilistic  planning:  In  contrast  to  symbolic  planning, 
(PO)MDP-based  planning  has  also  been  explored  for  multi-robot  task  plan¬ 
ning.  Related  topics  include:  planning  with  concurrent  actions  [25,  34], 
planning  under  temporal  uncertainty  [15,  40],  incorporating  temporal  logic 
into  navigation  task  planning  [8],  and  planning  for  multi-robot  systems  us¬ 
ing  a  single  (PO)MDP  [18],  multiple  (PO)MDPs  [42],  The  (PO)MDP-based 
algorithms  model  non-deterministic  action  outcomes  and  aim  to  optimize 
long-term  expected  reward.  The  symbolic  planning  paradigm  is  fundamen¬ 
tally  different  since  the  goal  is  to  generate  interpretable  plans  with  minimal 
expected  costs,  assuming  deterministic  action  outcomes. 

6  Conclusion  and  Future  Work 

This  thesis  first  investigates  the  performance  of  state-of-the-art  PDDL  and 
ASP  planners  in  navigation  domains.  The  empirical  comparison  shows  that 
the  PDDL  planner  is  faster  at  computing  long  plans,  whereas  the  ASP  solver 
scales  better  with  number  of  objects.  We  use  the  ASP  solving  system  Clingo 
for  planning  in  the  domain  of  the  BWI  robots.  Based  on  the  existing  single¬ 
robot  planning  platform,  the  second  part  of  the  thesis  presents  an  algorithm 
to  efficiently  improve  overall  plan  quality  for  a  team  of  robots. 

Possible  future  work  includes  generalizing  the  conditional  planning  to 
enable  a  larger  variety  of  collaborations,  and  improving  the  ICP  algorithm 
for  better  efficiency  and  optimality.  This  thesis  models  navigation  durations, 
and  conditional  costs  between  navigation  actions.  In  the  real  environment, 
temporal  uncertainty  also  arises  in  actions  that  wait  for  human  input,  and 
collaborations  may  be  needed  in  non-navigation  tasks,  such  as  handing  off 
objects  to  another  robot.  Further,  current  evaluation  methods  are  insuffi¬ 
cient  to  compare  average  performances  of  Brute-force  S-ICP  and  E-ICP.  We 
may  observe  that  the  performance  of  our  solution  depends  on  the  accuracy 
of  estimated  conditional  costs,  as  well  as  the  optimality  of  the  iterative  al¬ 
gorithm.  Assuming  a  cost  model,  the  evaluation  of  Brute-force  S-ICP  and 
E-ICP  does  not  require  executing  generated  plans,  since  plan  quality  can 
be  determined  by  the  expected  cost.  Given  more  time,  I  would  like  to  com¬ 
pare  the  optimality  of  the  two  ICP  algorithms  in  a  large  number  of  domain 
configurations,  and  propose  improvements  on  the  algorithm. 
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